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Abstract 

In a variety of applications involving longitudinal or repeated-measurements data, 
it is desired to uncover natural groupings or clusters which exist among study sub¬ 
jects. Motivated by the need to recover longitudinal trajectories of conduct problems 
in the field of developmental psychopathology, we propose a method to address this 
goal when the data in question are counts. We assume that the subject-specific obser¬ 
vations are generated from a first-order autoregressive process which is appropriate 
for counts. A key advantage of our approach is that the marginal distribution of 
the response can be expressed in closed form, circumventing common computational 
issues associated with random effects models. To further improve computational effi¬ 
ciency, we propose a quasi-EM procedure for estimating the model parameters where, 
within each EM iteration, the maximization step is approximated by solving an ap¬ 
propriately chosen set of estimating equations. Additionally, we introduce a novel 
method to express the degree to which both the underlying data and the fitted model 
are able to correctly assign subjects to their (unknown) latent classes. We explore 
the effectiveness of our procedures through simulations based on a four-class model, 
placing a special emphasis on posterior classification. Einally, we analyze data and 
recover trajectories of conduct problems in an important nationally representative 
sample. 
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1 Introduction 


Many longitudinal studies have the aim of tracking the change in some outcome or re¬ 
sponse over time. This is an important and common goal in the held of developmental 
psychopathology, which aims to study the natural history of common childhood psychiatric 
diseases such as conduct disorder and delinquency. Often, there exists substantial variabil¬ 
ity in the response-paths of observed trajectories across subjects, and grouping subjects 
with similar trajectories may reveal certain sub-populations that exhibit interesting devel¬ 
opmental patterns. In conduct disorder research, for example, such distinct “developmental 
sub-types” of trajectories are of great interest because the classihcation carries important 


informati on about leve l of im pai rment, 
(see, e.g. lOdgers et all (1200711 . or 


Moffitt 


uture life outcomes, and possible etiologic origin 
(1199311 1. Furthermore, it is of interest to robustly 


estimate such trajectory sub-types using large and representative samples, to do so in a 
computationally efficient way, and to use those estimates to recover class membership at the 
subject level. The problem of identifying a hnite number of sub-populations is frequently 
formulated as a latent class or hnite mixture model where the distribution governing the 
observations on a given subject is determined by an unobserved class label. 

In an alternative approach to the analysis of longitudinal data, random ehects are in¬ 
troduced to account for the heterogeneity across subjects and the correlation among obser¬ 
vations on the same subject. If the conditional distribution of the response given the values 
of the random ehects is not Gaussian however, the marginal distribution of the response 
will typically not have a closed form. In these cases, evaluation of the likelihood requires 
numerical integration over the distribution of the random ehects. Direct maximization of 
the likelihood then involves numerical integration for every evaluation of the likelihood. 
A number of other estimation approaches for models of this type, commonly referred to 



More recently, some authors have combined these two approaches. They have observed 
that with longitudinal data, a small number of latent classes is not sufficient to account for 
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all of the realized between-subject heterogeneity and correlation among observations within 
snbjects. They have therefore developed models with latent random effects in addition to a 
l atent vari able indicating c lass membership already present (see e.g. iMuthen and Shedden 


(jl999l) . or iQn et al.l (119961) h Whereas this approach has gained traction in the applied 
literatnre, it poses two potential methodological problems. First, the addition of within- 
class random effects to the latent class model may complicate compntation considerably. 
For example, if using an EM algorithm for estimation, not only does one confront within 
each iteration the difficulties associated with GLMMs, but one must also use numerical 
integration to update the class-membership probabilities within each iteration. A within- 
class closed form likelihood would be much more computationally tractable. 

The second problem involves the distinction between global and local correlation among 
observations on the same subject. To articulate this concern, we hrst posit that one key 
role of the latent class structure is to account for the global correlation, i.e., the correlation 
that exists between all observations on a subject, whether they are separated by a short or 
by a long time lag. In classic latent class analysis, given class membership, all observations 
on a given subject are independent, and this assumption drives the identihcation of the 
model. This assumption is, however, somewhat restrictive and there are concerns that it 
could lead to models with too many classes that are too small if there remains residual 
correlation among some of the observations within subject. An obvious solution is to 
allow—and model—in a restricted way some correlation among some observations. The 
introduction of random effects into growth mixture models attempts to address this need. A 
concern, however, is that random effects also account for a more global type of correlation, 
confounding the role of the latent classes and that of the within-class correlation structure in 
model identihability, htting, and testing when classes are not crisply separated. In contrast 
to random effects models, auto-regressive processes represent an alternative and more local 
source of within-subject correlation, allowing observations close together in time to be more 
strongly correlated than those further apart. Local correlation is not at all accounted for 
by the latent class structure. With a within-class local correlation model, the observations 
far apart in time will be nearly independent, strengthening model identihability. 

To address these issues, we propose a longitudinal latent class model for count data 
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which yields a closed form likelihood, accounts for local correlation among the repeated 
measures on a given subject, and allows for global association to be accounted for by the 
latent class structure. With our approach, correlations between observations far apart in 
time will be especially informative about class membership because the subject-specihc 
correlation in these cases will be negligible. 

Onr contributions in this paper are as follows. In the next section, we provide a tech¬ 
nical description of onr discrete data AR-1 process latent class trajectory model, followed 
in Section 3 with onr approach to estimating and making inferences on model parame¬ 
ters. There, we rely on a variation of the EM algorithm that exploits a general estimating 
function rather than the trne likelihood score fnnction. In Section 4, we propose a novel 
measnre of the inherent ability of latent class data to discriminate among classes for snb- 
jects in the population. To onr knowledge, this is a previously unexplored, bnt important 
construct in latent class analysis especially when snch analysis is used to to assign snbjects 
to their classes based on their manifest data. Because it is based on the trne data generating 
mechanism, our measure represents an upper bound on the ability for any fitted statistical 
model to perform class assignment. We also extend this measure to such htted models and 
modeling procedures for hnite data. Section 5 presents a simulation study with the aims 
of quantifying the statistical operating characteristics of our proposed model in terms of 
parameter estimation, bias, and conhdence interval coverage. We also quantify accuracy 
of class assignment. Additionally, we examine the ability of onr approach to snpport class 
assignment when the data generating mechanism is different from the one specihed by onr 
model. Finally, we examine a longitudinal study of conduct problems, and illustrate the 
use of our model for class dehnition and assignment in that study. 

2 Model Description 

2.1 Data Structure and Trajectory Model 

Let ji = {yn,... ,yini) be observed longitudinal connts associated with the subject. In 
total, we have measurements on m snbjects Y = (yi,... ,ym)- We have observations on 
snbject i at each of the time points (tn ,..., tmj, and we let yij denote the observation on 
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subject i at time Uj. For each subject and each observed time point, we also observe a p x 1 
covariate vector x^-, with Xj = (xji,..., Xj„J^ denoting the rij xp design matrix for subject 
i. In addition, each subject has an unobserved “latent class” Zi with Zi G C} 

indicating membership in one of C latent classes. 

The distribution of (yj|Xj, Zj = c) is governed by a vector of class-specihc parameters 
6c with p(yj|Xj, Z* = c) = Pi(yj; Be) denoting the distribution of y^ given covariates Xj and 
class label c. Observations made on different subjects are assumed to be independent, and 
the class labels (Zi,..., Z„) are assumed to be i.i.d. random variables with the vector of 
mixture proportions tt = (tti, ..., ttc) determining the class-membership proportions (i.e., 
F(Zj = c) = TTc) in the population. 

Conditional on a subject’s class, the mean-response curve or latent trajectory is 

F(yj|Zj = c,Xj) = EeXYi) = Mi = (Mu, ■ ■ ■ > (1) 

where Eg^{-) denotes taking expectation conditional on subject i belonging to class c. 

We relate the mean curve ^tj to the covariates through log(pfj) = x^/3^, where f3c = 
(/dj’,..., /dp) are the class-c regression coefficients. To allow for overdispersion, the variance 
function has the form Vaig^^i/ij) = with scale parameter (pc allowed to vary across 

classes. Due to our data-generating model (Section l2.2p . we must have (pc > 1. For this 
reason, we often write the scale parameter as 0c = 1 + lci where 7 c > 0. 


2.2 The INAR-(l) Negative Binomial Process 

Conditional on class-membership, the observations from subject i comprise a multivariate 
outcome with distribution Pi(yj; 0c), governed by the (p-l-2) x 1 class-specific parameter vec¬ 
tor Oc = (/3j, Qfc, (pcY■ The distribution of y* = (?/ii,..., pinj is modeled by assuming that 
the components pjj of y* arise from a first-order Markov process governed by Oc- The joint 
distribution of y* is built up directly through the transition function, p(?/jjjpjj_i, Xj; 0c), 
associated with the underlying process Pi(yj;0c) = p(2/ji|Xj; 0c) ^c)- 

The correlation structure of yj then arises from the various dependencies introduced by the 
Markov process. 

A stochastic process tailored specifically for count data is the integer- valued auto r egres- 
sive (INAR(l)-NB) process with negative binomial marginals described in iMcKenziel fjl986f) 
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and [Bockenholtl fjl999h . For a subject in class c, observations from the INAR(1)-NB pro¬ 


cess arise as follows: the hrst observation yn follows a negative binomial distribution with 
EeXVii) = Ki and Vene^iya) = 7c). We denote this by ya ~ NB+ 7 c)} 

meaning that yn has probability mass function 

'k + /i,"i/7c - 1 


PiVii = k) = 


k 


( ^ ] 

7c A 

Vl + 7c^ 

U + 7c7 


k>0. 


( 2 ) 


The subsequent values (?/^ 2 , • • • ,2/mJ are determined through 

Vij T 3 * * * 7 ^^7 


(3) 


where conditional on {yn, ..., ?/ij-i) and latent probabilities {qi 2 , ■ ■ ■, qim), 

Hr, Binomial(|/jj_i, gij), with the understanding that Hij = 0 whenever yij-i = 0. 
The success probabilities {qi 2 , ■ ■ ■, qim) are themselves independent random variables with 
qij ~ Beta{ac/iij_i/7c, (1 — Oic)l^ij-i/lc}, where Beta(a,/9) represents a Beta distribution 
with shape parameters a and /3. Because this implies that that E[Hijh\yij-i\ = acyij-i 
marginally over q^j, we must have ccc ^ [0,1] for each class. One may also note that 
given yij-i > 1 and class membership c, H^j follows a beta-binomial distribution with 
parameters {yij-i,acyij_i/'yc, (1 ~ ctc)ftij_i/7c). The innovation component is assumed 
to follow a NB|/rfj(l — etc), hp(l~ttc)(l+7c)} distribution where ( 1 ^ 2 , • • •, hm) mutually 
independent and where each lij is independent of the history {yn,... ,?/j j_i). 

Although the transition function nf'Vpj'V,; ^-i, X,;, ^cj assoc i ated w ith the INAR(1)-NB 
process does not have a simple closed form (see iBockenholtl (jl99911 ). it may be directly 


computed by using the fact that it is the convolution of a beta-binomial distribution and 
a negative binomial distribution. In addition, under our description of the INAR(1)-NB 
process, the marginal distribution (conditional on class membership) of yij is negative 
binomial with Eg^{yij) = and Va.ie^^yij) = -l- 7 c), and the within-class correlation 

structure of y* is first-order autoregressive. That is, for two observations yik and yij on 
the same subject, the within-class correlation is CoiieXUikiUij) = The conditional 

expectation of y^j given yij-i is a linear function of yij-i, 


HdciVijlViJ-l) ~ T 


(4) 
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and the conditional variance of yij given is given by 


Vare^(l/jj|l/jj_i) = 1 - + yij-Atj U - A: 


\ Ac + ViJ-l 




1 + 


(5) 


where A A = 

It is also worth mentioning that onr specihcation of the INAR(1)-NB process places the 
following restriction on the relation between the antocorrelation parameters and the latent 
trajectories: < minjj{/ifj_^//if^-,for each c. However, when all of the latent 

trajectories are reasonably smooth, this constraint is not especially restrictive as the valnes 
of will be close to one. 


3 Estimation 


Because a finite mixture model with a fixed number of components can easily be formulated 
as a “missing-data” problem, the EM algorithm provides an attractive estimation approach. 
In our model, if the individual class-membership labels Z = {Zi,..., Zn) were observed, 
the “complete-data” log-likelihood would be 

m C 

logL(©,7r;Y,Z) = EE l{Zi = c}(^log(7rc) -hlog{pi(yi;0c)}). (6) 

i=l c=\ 


Above, © = (01,..., Oc) where 6c = {(3c, etc, 1c) are the parameters associated with class 
c, and TT = (tti, ..., tiq) is the vector of mixture proportions. 

Given current, interation-Zc, estimates of parameters {&^^\7z each EM iteration 
obtains new parameter estimates by maximizing the current expectation of the complete- 
data log-likelihood, with the expectation being taken over the unobserved class labels, viz. 


(© 


(fc+l) = 


argmaxi^l logL(©, tt; Y, Z) Y, ©^'^^ 
e,n I 


C m 


argmax EE log(7rc) + log{pi(yi;0c})- (7) 

©,7r 


C=1 2=1 


Here, ,77^^^) is the current estimated posterior probability that subject i belongs 

to class c, namely 


= P{Z, = c y„ ©W, ttW) = 


T.s^sPi{yi-,6^J^^) 


( 8 ) 
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3.1 Estimating Equation Approach for Computation 


In each step of the EM algorithm, updating the class probabilities is straightforward be- 
cause the update is simply the average of the current posterior probabilities, ttc = 
m Sill However, to update the remaining parameters, we must maxi¬ 

mize K separate weighted log-likelihood functions 

m 

= argmaxy'lEic(©^''l7r^^))log{pi(yi;6»c)}, c=l,...,C'. (9) 

Because each such log-likelihood function is a sum over many complicated transition prob¬ 
abilities, implementing the maximization in dH]) may be challenging. 

Instead of updating the parameters by maximizing each of the weighted log-likelihood 
functions directly, we found that replacing the score function with a more manageable esti¬ 
mating function provides a less cumbersome approach. That is, letting Sj(0c) = d logpidy*; ^c )/dOc 
denote the score function, we suggest that, rather than solving 

m 

= o ( 10 ) 

i=l 

for each class, one instead solve 


5;^ wue"’\ = 0; tor c = 1 ,..., C, 


( 11 ) 


2 = 1 


where Ui{6c) forms an unbiased estimating function (i.e., EeJ\Ui{0c)\ = 0 ) for each class. 
Such an approach, where within each EM iteration the maximization step is approxi- 
mated by solving an estim ating equation, is similar to the estimation strategy described in 


Elashoff and RvanI ( 2004 1. 


Our choice of Ui{6c) relies on the extended quasi 
structing estimating equations that was proposed in 


ikelihood function proce dure for con- 


Hall and Severinil (119981 ). These esti¬ 


mating functions considerably simplify c omputation (i.e., by solvi ng (fTT]) l when compared 
to using score functions Si{0c)- Tailoring iHall and Severinil (119981) to the case of a log-link 
function and an AR(1) correlation structure yields the (p -|- 2) x 1 estimating function 


U^i0c) = 




xf ^(ac)A. - /x?) 


1 

4>c 

- (y, - - Hi) 

U?H6c) 


jSYi - Mi)^A”^/^(M?)R*~\«c)A~^/^(^)=)(yi - /x,^) - m 



(1 


2 ) 
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In ffT2l) . Aj(/x^) is the n, x rij matrix defined by = diag{/iii, ■ ■ ■ ,hmi} and Rj(ac) is 

the Hi X rii correlation matrix whose (/c, j) entry is Ri{ac)[k, j] = 

The equation determined by setting the p-component vector Ym=i to zero 


^ 5; XfAp(K)R-‘(aJAr‘'^(^')(y, - = 0, 


- 1 / 2 , 


(13) 


2 = 1 


corres ponds to the generalized estimating equation (GEE) described in IZeger and Liang 


()l986l) . In GEE, the autocorrelation parameter ac is first estimated separately and then 
plugged into flTSl) in order to solve for regression coefficients (3^. In contrast, solving 
(ITT]) requires that ac be estimated simultaneously with /3^. To solve (ITT]) for a fixed c, 
we first update f3c by solving c) = 0, using initial values for 

(pc, 0c)- Using this value of f3c and the initial over dispersion (j)c, ac is updated by solving 
YlT=i lU^,c(©*'^^ Tz'^^'^)uf\oc) = 0. The value of <pc can then be updated non-iteratively be¬ 
cause, given values of (/3^, ac), solving = 0 for 0c has a closed 

form. This procedure is repeated until convergence. 


3.2 Quasi-EM Procedure and Standard Errors 

Our altered EM algorithm, where the score function is replaced with another, more man¬ 
ageable estimating function, may be summarized as follows: 

1. Find initial estimates (Our initialization procedure is described in the 

appendix). 

2 . Gompute current estimated posterior probabilities for each class and 

subject. 

3. Update mixture proportions through = U ^ TUc(©*'^\ tt^^^). Update other 

parameters { 61 ,..., 6 k) by solving Yh=i U0c(®^^^ 7r^^0f/i(0c) = 0, for c = 1,..., G. 

4. Repeat steps (2) and (3) until convergence. 

Parameter estimates (©, tt) produced from the above iterative procedure may be viewed as 
a solution to the estimating equation G(0, tt) = Ym=i tt) = 0, where Gi (©,7r) is the 

(p-|-3)G — 1x1 vector defined as Gj(©, tt) = [vec(Vj), and where Vj is the (p-l-2) x C 
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matrix V* = [VFii(©, 7r)f/j(0i),., Wic{®, Tz)Ui{Oc)\ and bj is the (C — 1) x 1 vector 

b* = [Phii(©,7r) -TTi,. -t^k-iY■ 

If we let U^{6c] yi) be the component of Ui{6c), we can see that G(©, tt) = 0 forms 
an unbiased estimating equation for (©, tt) because the expectation of the {k, c) component 
of Vj is given by 


E\wU&,7r)U^{0,;y, 


= TTrE 


and the expectation of the element of bj is given by 


PeMrrk, 


p{yi 


U:{e,-y,) = T^^EeA UAOc;y^) = 0 , (14) 


E\ tt) -tiA = TIcE 


PeAyi) 

p{yi) 


- TT^ = 0. 


(15) 


Conditions under which solutions of unbiased estimating equations are asymptotically nor¬ 


mal (a: 
12.4 of 


ter normalizi ng a ppropriately) a. re discussed in a number of sources (see e.g., section 


Hevdd fjl997l) . or ICrowderl fjl986l) ). By applying the typical results regarding asymp- 


--— 1/2 

totic normality to our setting, we have that S |(©,7r)^ — (©,7r)'^} — N{0,lqK-i) 
as m —> oo, where S is given by 


ni 11 1 ' '' 1 ^ 

2= (-E^{DGi(e,*)})" (-5^G,(e,*)Gi(0,*f)(-5^£{DG,(e,*)})^ 

2 = 1 2=1 2=1 

(16) 

In (fT6|) . DGj(©,7r) is the ((p -|- ?>)C — l) X ((p -|- 3)C — l) matrix of partial derivatives 
DG,(6>,7r) = aG,(©,7r)/a(©,7r). We compute standard errors using the diagonal elements 

of S. 


4 Class Assignment and Discrimination Measures 

4.1 Class Separation Index and Expected Empirical Discrimina¬ 
tion 

After estimating latent trajectories for each class, one often wishes to go back and assign 
or classify subjects based on their empirical resemblance to one of the estimated trajec¬ 
tories. Even though all the model parameters may be estimated accurately, however, the 
associated classihcation procedure may not have a high rate of correct assignment. Poor 
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class-discrimination may be due to little underlying class-separation between the latent 
trajectories or to high levels of noise 0c in the response. In this case, the ability to 
correctly classify subjects is limited by the class separation inherent in the underlying true 
data generating model. To quantify this inherent class separation, we propose a novel class 
separation index (CSI) which provides a rough upper bound on one’s ability to correctly 
assign subjects to classes based on our or any other latent class model. To accomplish this, 
the CSI measures the classihcation performance that would be attained if the underlying 
true generative model were known. 

To £x notation, consider data of the form Y = (yi,..., y^) and let 
Ac(y) = [ai(Y),..., am(Y)]^ be a procedure which maps data Y into an m x C matrix of 
nonnegative entries whose rows sum to one; the {i, c) entry of this matrix can be thought 
of as a reported probability that subject i belongs to class c. For instance, we could 
have klc(Y) = W^(0,7r;Y), where W^(©,7r;Y) = [w^(0, tt; yi),..., w^(©, tt; y^)f 
denotes the mxC matrix whose row, w*"(0, tt; y^)^, contains the posterior probabilities 
for subject i computed with the parameters (0, tt). Alternatively, we could have AciY) = 
V{r],Y), where V{r],Y) denotes a matrix of class membership probabilities computed 
under an incorrect working model with specihed parameter rj. Or, we could have Ac{Y) = 
V{r]{Y),Y), where i]{Y) is an estimate of r/ computed with the available data Y. 

To measure how well a procedure Ac(Y) predicts class membership, we will use a 
discrimination index D, which, given class labels Z = (Zi,..., Zm), returns a score in [0,1] 
such that D(7i, Ac'(Y)) = 1 if Ac{Y) has a 1 in the correct cell for all observations (rows), 
and is less than or equal to one otherwise, and for any ■) considered, values closer 
to one will imply better classihcation performance. For instance, D(-, •) often has a form 
similar to a t/-statistic of order C with kernel h, namely 


D(Z,Ac(Y)) = 


N,---Nc 






an(Y) 


Zic 

a.c(Y) 


(17) 


where Nc = = c} and where the summation is performed over all values of 

(A,..., ic) such that no two members of (0,..., ic) are the same and each 1 < ij < m 
which means that the summation in flT7)l is performed over (p)C! terms. Examples of 
such as the c-statistic and the polytomous discrimination index are discussed in 
Section 14.21 
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The class separation index may be defined for any latent class model having responses 
Yi, latent class indicators Zi and whose data generating mechanism is governed by the 
parameters (©, tt). In particnlar, for a choice of index D{-, •), the CSI is defined as 

CSI = lim E|T)(Z,W^(©,7r;Y))|, (18) 


provided that the above limit exists. Henristically, the CSI is simply the expected dis¬ 
crimination that would be obtained in the long run by an oracle using knowledge of both 
the true data-generating model and the true parameter values to compute the posterior 
probabilities of class membership. Hence, the CSI is a population quantity measuring the 
underlying separation of the classes and depends on neither the sample size nor on the 
particular way the data have been analyzed. For example, with an index of the form 

ffT7|l . the CSI would be 


CSI 


-^- clh 

TTf ■■TTc 


Zi 

w'^(0,7r;yi) 


Zr 

w'^(©,7r;yc) 


(19) 


Turning to finite samples, the realized or empirical discrimination resulting from using 
procedure Hc(Y) is Z)(Z, Hc'(Y)), and the expectation of this quantity is what we define 
to be the expected empirical discrimination (FED), namely 


EED = E[D{Z,Ac{Y))y ( 20 ) 

where the expectation in (120|) is taken over (Z, Y) for a given sample size m, under the true 
data generating model. Note that the EED may be computed for any procedure Ac(Y) 
even when, for example, Ac{Y) corresponds to randomly generated predictions or contains 
predicted probabilities computed under an incorrect working model. Hence, comparing 
the CSI and the EED may serve as a robustness check in the sense that the CSI may be 
calculated for any number of hypothetical alternative models, and for each of these, the 
EED may be computed using Hc'(Y) from the assumed working model. For example, we 
could compute the CSI under a model with Poisson responses and Normal random effects 
using (fT8|) and compare it with the EED obtained (taking expectation under the Poisson- 
Normal model) by using our INAR(1)-NB model to find posterior probabilities of class 
membership. We explore such a comparison with a small simulation study in Section 5.2. 
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4.2 Measures of Discrimination for Multiple Classes 


When predicting a binary outcome or class label, a widely used measure of a model’s ability 
to discriminate between the two class labels is the concordance index or c-statistic. If we 
let Zi G {1,2} denote the class labels and let Pi(c) denote the predicted probability that 
Zj = c, c = 1,2, then the c-statistic C 12 is dehned as 

Ci2(Z, p(l), p(2)) = 


isAi j(zA2 


2N1N2 ^ 


i&Ai j&A2 


( 21 ) 

where Ac = {i Zi = c} is the set of subjects with class label c and W = = c}. 

Letting pj = [pj(l),pj(2)]^, the c-statistic has the form ([T7D with C = 2 and kernel 


Zi 

Pi 


Z 2 

P2 


— l{Zi — 1}1{Z2 — 2}( l{pi(l) > P2(l)} + 


i{:pi(i) =:P2(i)} \ 
2 /■ 


( 22 ) 


When interest lies in assessing the discriminatory capabilities of a classihcation method for 
a multi-class problem, it may be useful to consider r nulti-class extensions of the c-statistic. 
One such extension is the all-pairwise c-statistic (see lHand and Tilll (120011 )) which is simply 
the average of the usual pairwise c-statistics (1^ taken over all possible pairs of classes. 
For a O-category outcome, the all-pairwise c-statistic (APCc) is dehned to be 

APCc=(0 5^Ci,(Z,p(l),p(j)), (23) 


k<j 


where Ckj{-) is dehned as in fl2l|) and is computed using only subjects from classes k and j. 
Like the c-statistic, the all-pairwise c-statistic lies between 0 and 1 with larger values 
indicating better discriminatory performance and with values near 0.5 indicating that the 
model performs no better than predicting labels at random. 

Other multi-class discrimination measu res include the v olume under the ROC surface 
and “one-versus-rest” measures (see e.g., iMossmaru fll999l) j. One discrimination index 


which uses the entire set of predicted probabilities rather than averagin g over the pairwise 


Van Calster et af 


measu res is the Polytomous Discrimination Index (PDI) proposed in 
(120121) . Before providing the dehnition of the PDI, we hrst let pi = (pi(l),... ,Pj(C)) denote 
the subject’s vector of predicted probabilities with Pi(c) representing the predicted 
probability that subject i belongs to class c. Then, for a C-class model, the PDI is dehned 


13 

















Scenario I Scenario ii 



0.25 

0.50 

0.75 

1.00 1.25 

1.50 1.75 

2.00 

0.25 

0.50 

0.75 




time 





time 


Figure 1: Mean latent trajectories for the two central simulation scenarios. In Scenario I, 
observations for each subject are made at each of the eight time points tij = j/4, j = 1,..., 8, and 
in Scenario II, observations for each subject are made at each of the five time points tij = j'/4, 
j = 1,..., 5. For both scenarios, the class-membership proportions are as follows: Class 1: 50%, 
Class 2; 25%, Class 3; 15%, and Class 4; 10%. 
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to be 


° CN, ■ Nr ^ ■■■ ^ E9=(P-'""P-b. (24) 

^ ii&Ai ic&Ac c=l 

where Ac = {i : Zi = c} is the set of subjects in class c and where gc{pii, ■ ■ ■, Pic) equals one 
if pi^{c) > pij{c) for all j ^ c, and equals zero if there is a. j* ^ c such that Pj^(c) < p^., (c). 
If (pi^(c),... ,pj^(c)) does not contain a unique maximizer and pi^{c) is one of those tied 
for the maximum value, then one sets gdpi^^,... ,Pic) = 1/t, where t is the number of 
cases tied with pi^{c). Unlike the c-statistic or all-pairwise c-statistic, a method producing 
predictions at random will generate a PDIc value near 1/U rather than 0.5. 

Table 1: Class separation indices for each of the two central scenarios with several different 
values of the autocorrelation and scale parameters. The class separation indices are computed 
using both the all-pairwise c-statistic (APCc) and the polytomous discrimination index (PDIc) 
as measures of classification performance. 

Scenario I Scenario II 

0 = 1.25 0 = 3.0 0= 1.25 0 = 3.0 



a = 0.1 

a = 0.4 

a = 0.1 

a = 0.4 

a = 0.1 

a = 0.4 

a = 0.1 

a = 0.4 

APCc 

0.976 

0.944 

0.922 

0.892 

0.900 

0.867 

0.828 

0.802 

bd 

O 

1—1 
o 

0.934 

0.872 

0.812 

0.756 

0.775 

0.712 

0.646 

0.608 


5 Simulations 

5.1 Autoregressive Models with Four Latent Classes 

Design 

To evaluate the performance of our estimation procedure and to test our implementa¬ 
tion, we performed a simulation study using two central scenarios (Scenarios I and II) as 
our point of reference. Each of Scenarios I and II involves a model with fonr latent classes 
where the within-class model is the antoregressive negative binomial model described in 
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Figure 2: Expected empirical discrimination (EED) and Class Separation Indices (CSI) for 
Scenarios I and II of the INAR(1)-NB model. Values of the EED are shown when the parameters 
are estimated from simulations with m = 200, m = 500, m = 2,000 subjects. 


Sections 12.11 and 12.21 In Scenario I, each subject is observed over 8 equally spaced time 
points, and in Scenario II, subjects are observed over 5 time points. As shown in Figured! 
the latent trajectories for both of these scenarios are qualitatively similar. The choice of 
four classes for the simulation scenarios is meant to reflect the wide use of four-class mod- 
els when ideri t ifying s ubtypes i n the childhood development of conduct disorders (see, e.g., 
Odgers et al.l (120071 1 or lMoffittI (119931) b The goals of the simulation study include evaluating 


the classihcation performance associated with using the estimated posterior probabilities, 
examining the empirical bias of parameter estimates, quantifying the degree to which the 
standard errors provide approximately desired coverage levels, and assessing how each of 
these operational characteristics vary as a function of the class separation index. 

The values of the class separation index for Scenarios I and II and their variants are 
displayed in Tabled! For each of the two scenarios, we varied the autocorrelation parameter 
across two levels, a G {0.1, 0.4}, and varied the scale parameter across two levels, 0 G 
{1.25, 3}. As may be inferred from Tabled! higher values of the scale parameter and higher 
levels of autocorrelation reduce the class separation and thereby the inherent ability to 
correctly classify individuals. Also, for each parameter conhguration (i.e., a particular 
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scenario and choice of {a, 0)), we ran onr estimation procedure for each of the sample sizes 
m = 2, 000, m = 500, and m = 200. For each parameter conhguration and sample size, we 
computed estimates on each of 200 replicated data sets. To determine convergence with a 
given tolerance level e, we used the criterion max^ |m“^G^(©, 7r)| < e, where G(©,7r) is 
as dehned in Section [221 and G*'(©,7r) denotes the element of G(©, tt). 

Because the class labels are not identihable, for each model htted, we permuted the 
class labels of the estimated parameters in order to minimize the L 2 distance between the 
estimated and true mean curves. That is, for a set of estimates © = { 61 ,... , 64 ) and 
TT = (tti, ..., 714 ) obtained through our quasi-EM procedure, we computed the optimal 
permutation according to 

4 

s* = argmin V (25) 

where V simply denotes the set of permutations of class labels (1,2, 3,4). We then com¬ 
puted the hnal estimates of the parameters through © = ..., 0 s*( 4 )) and tt = 

(vr^qi), ..., 7r5»(4)). Note that /x'^ = exp(Xj/3c) and = exp(Xj/3^) do not depend on i 
since all subjects share the same design matrix in our simulation scenarios, though this 
need not be the case in real applications. 

Results 

Figure [2] displays the discriminatory performance (using both the polytomous discrim¬ 
ination index and the all-pairwise c-statistic) of our estimation procedure across the eight 
simulation settings. For settings with highly distinct classes, the EED obtained from using 
the estimated posterior probabilities compares favorably with the oracle procedure even 
for relatively modest sample sizes (i.e., m = 200 and m = 500) though the gap between 
the EED and the oracle-based CSI tends to widen noticeably as the classes become less 
distinguishable. Notably, the m = 2, 000 simulations illustrate that, for highly distinct 
classes and reasonably large sample sizes, the discriminatory performance of our procedure 
is nearly identical to that of the oracle procedure. 

Figures [3] and m illustrate how the underlying separation among the classes influences 
other properties of estimation. Figure [3] displays the absolute empirical bias for all regres¬ 
sion coefficients and for all eight simulation scenarios. The empirical bias for a particular 
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Figure 3: Absolute empirical bias for the regression coefficients using data simulated under 
eight settings of the INAR(1)-NB model (i.e., each of the scenarios from Tabled]). The simulation 
settings are ordered according to the class separation index with simulation setting 1 having the 
highest class separation and simulation setting 8 having the lowest class separation. The values 
of the empirical bias were obtained using 200 replications for each of the eight scenarios and each 
choice of the number of subjects (either m = 200 or m = 2,000). 
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Figure 4: Coverage proportions (using 95% confidence intervals) for the regression parameters 
and mixtnre proportions using data simulated under the eight settings of the INAR(1)-NB model 
(i.e. each of the scenarios from Table [T]). The simulation settings are ordered from left to 
right according to the class separation index with simulation setting 1 having the highest class 
separation and simulation setting 8 having the lowest class separation. Coverage proportions are 
shown for simulations with m = 200 subjects and m = 2 ,000 subjects. 
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parameter estimate is found by taking the absolute value of the difference between the true 
parameter value and the mean of the simulated estimates. Figure [3] shows that, for high 
values of the CSI, the empirical biases are packed closely to zero but spread out consider¬ 
ably as the CSI declines. Similarly, in Figure HI we can see that the computed confidence 
intervals generally give the desired 95% coverage for large sample sizes (m = 2, 000) in 
highly separated settings. However, the level of coverage and variability in coverage tends 
to deteriorate as the class separation decreases. 


5.2 Poisson Outcomes with Normal Random Effects 


To evaluate the performance of our proposed htting procedure under model misspecifi- 
cation, we performed several simulations involving latent class models where within each 
class we generate data under a generalized linear mixed model with Poisson responses 
and Normal random effects. The motivation for this simulation exercise is to assess how 
well our method classifies individuals to latent classes when this alternative model holds 
rather than the AR(1) model. Because we can compute the class separation index for any 
Poisson-Normal model via ffTSjl . comparing the FED obtained from our estimated poste¬ 
rior probabilities to the CSI will provide an indication of how well class assignments are 
recovered under model misspecification. 

In these simulations, conditional on a subject-specihc random slope, intercept, and class 
label, the response of each subject was generated according to a Poisson distribution. In 
particular, for subject i in class c, the response was generated as Yij ~ Poisson(Ajj) where 
log(Ai^) = (3q+ ttio + ttiitij, j = 1,... ,T, and where ( 0 , 0 , an) are jointly Normal 
random variables with 0*0 and Cov(ajo,aii) = 

— [(T — 1)ct^^]/ 2. This means that the mean trajectories are quadratic on the log scale, viz. 


log (E{Y,j\Z, = c)) = + ^ + + 




l3 


(J. 


t.i + -fttr (26) 


As in the simulations of Section 15.11 we used four latent classes c = 1,...,4 for each 
simulation setting. In total, we considered four simulation settings: one with eight time 
points and the other three having five time points. In each of these, the parameters were 
chosen so that the mean trajectories were similar to those in Scenario I and Scenario II. 
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Figure 5: Expected empirical discrimination (EED) and Class Separation Indices (CSI) for 
Poisson outcomes with Normal random effects models. The values of the EED are shown when 
class assignment is based on fitting the INAR(1)-NB model with m = 500 and m = 2, 000 subjects. 
Eifty replications were used for each simulation setting and choice of m. 

The parameter values used for each of these four simulations settings are provided in the 
appendix. For each setting of the Poisson-Normal model and each simulation replication, 
we fit a four-class integer autoregressive model assuming that, within each class, the mean 
trajectory was quadratic on the log scale. 

The values of the class separation index and expected discrimination shown in Figure [5] 
suggest that our procedure is fairly robust to this form of misspecification. In particular, 
comparison of Figure [5] to Figure [2] indicates that the difference between the expected 
discrimination and the CSI does not suffer greatly under model misspecification for the 
range of alternative models and CSI values considered here. In addition, the high expected 
discrimination in settings with a high CSI signals that our procedure will still perform well 
under misspecification as long as the underlying classes are clearly separated. 

We are also interested in the degree to which we recover the latent trajectories in the 
Poisson-Normal model. Because of the form of the latent trajectories in fl26l) . we can 
examine the empirical bias of the regression coefficient estimates obtained from the fit of 
the integer autoregressive model. Figure ([6]) displays the empirical bias of the regression 
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Figure 6: Empirical bias for the regression coefficients when fitting an INAR(1)-NB model to 
data simulated under the four settings of the Poisson-Normal model. Computation of the bias 
uses the fact that the mean is quadratic on the log-scale (see (l26|) i. The simulation settings are 
ordered according to the class separation index with simulation setting 1 having the highest class 
separation and simulation setting 4 having the lowest class separation. 

coefficient estimates across the four settings of the Poisson-Normal simulations. Although 
these results are not directly comparable to those in Figure |3] due to the different number of 
replications and different mean model, Figure |6] suggests that we recover the mean model 
reasonably well for settings with a high class separation index. 

6 Application to CNLSY Data 

6.1 Description and Model Fitting 

In this section, we consider data collected on Children of the National Longitudinal Study 
of Youth (CNLSY). The National Longitudinal Study of Youth (NLSY79) is a longitudinal 
study initiated in 1979 on a nationally representative sample of young adults. In 1986, the 
NLSY launched a survey of children of female subjects from the original NLSY79 cohort. 
Assessments of the children of the NLSY were performed biennially, and in each assessment, 
mothers were asked to respond to a series of questions regarding each of their children’s 
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behavior and environment. 

Althongh the mothers were asked to respond on a wide variety of measnres, our focus lies 
in the severity of behavioral problems as measured by the “Behavior Problems Index” (BPI) 
in early childhood and by the number of delinquent acts committed during the adolescent 
years. The BPI is recorded for children ages 4 — 13 and is constructed by asking the mother 
to rate her child on a scale of 0 to 2 on each item from a list of seven common behavioral 
problems. Consequently, this yields a BPI value which is an integer between 0 and 14. 
Starting at the age of 10, the children were also asked to report the number of delinquent 
acts they committed during the past year. From age 14 onward, the mothers were no longer 
asked to respond to the BPI questions, leaving only the self-reported delinquency counts 
as a measure of behavioral conduct for children older than 13. 

Table 2: Summary statistics from the CNLSY data. In total, these data contain 9,626 subjects 
each of which was surveyed biennially over the ages 4 to 16 (or 5 to 17). For the age groups 4-5, 6-7, 
and 8-9, the counts are solely from the behavioral problems index (BPI). For age groups 10-11 and 
12-13, the counts represent the sum of the mother-reported BPI and the child-reported number 
of delinquent acts. For age groups 14-15 and 16-17, the counts are solely from the self-reported 
number of delinquent acts. 


Child Ages 

Mean 

10% 

25% 

50% 

75% 

90% 

Max 

4-5 

2.35 

0 

1 

2 

4 

6 

14 

6-7 

2.12 

0 

0 

2 

3 

5 

14 

8-9 

2.15 

0 

0 

2 

3 

5 

14 

10-11 

3.46 

0 

1 

3 

5 

8 

19 

12-13 

3.74 

0 

1 

3 

5 

8 

20 

14-15 

1.36 

0 

0 

1 

2 

4 

7 

16-17 

1.37 

0 

0 

1 

2 

4 

7 


To model the evolution of behavioral problems throughout childhood and adolescence, 
we added the BPI and the delinquency counts into a single response variable at each time 
point. For children less than 10, we used the BPI as the only response, and for children 
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aged 10-13 we took the response to be the sum of the delinquency counts and the BPI. 
For children older than 13 the response is simply the delinquency counts. To account for 
this methodological feature of the measurement process, we added a dummy variable for 
the time points corresponding to the age groups 10-11 and 12-13. In addition, because the 
mean of the delinquency counts is lower than the mean of the BPI (see Table [2]), we added 
another dummy variable for children older than 13. 

We modeled the subject-specihc trajectories with denoting the 

mean response of subject i at time tij conditional on belonging to class c, as 

3 

log(/i?,) = + = 4} + l{t,^ = 5}) + > 6}. (27) 

k=l 

In (l27|) . are the B-spline basis functions of degree 3 with no interior knots. We 

coded the time variable as follows: = 1 for children ages 4-5, = 2 for children ages 6-7 

with the remaining hve time points coded similarly. To handle subjects that had missing 
responses, we assumed that the correlation only depended on the order of the observed 
responses. For example, if subject i was observed at times 1, 2, and 4 with a missing value 
at time point 3, then we would have Corr 6 )^(|/j 4 , 7 /^ 2 ) = C(c and CoTT 0 ^{yi 4 ,yii) = a^. 

The CNLSY provides sampling weights which are estimates of how many individuals 
in the population are represented by a particular subject in the data. Hence, the sampling 
weights reflect the inverse probability that a subject was included in the sample. To account 
for this aspect of the sampling process, we htted latent class models where, within each 
iteration of the algorithm, we solve a weighted version of ffTTj) and evaluate convergence 
with a weighted version of the estimating function G(0,7r) dehned in Section lT2l This 
modihed version of the htting procedure that accounts for sampling weights is described in 
more detail in the appendix. 

We applied our procedure to the CNLSY data varying the number of classes from 3 to 6 
where, in each case, we modeled the mean trajectories with fl27)) . As is advisable in latent 
class modeling applications, for each choice of the number of classes, we repeated the esti¬ 
mation procedure with 20 different starting values; in each case, the weighted log-likelihood 
converged to the same maximum value for at least two runs. When comparing the best 
runs of these four different models (i.e., the 3 to 6 class models), the four-class model pos¬ 
sessed the lowest weighted BIC though the four and hve-class models had nearly identical 
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Figure 7: Estimated trajectories for the CNLSY data assuming four latent classes. The terms 
in parentheses represent estimated class-membership proportions. The left panel displays the 
estimated trajectories adjusted for the different measurement scales used at different time points; 
these can be thought of as the fitted latent trajectories on the mother-reported BPI measurement 
scale. More specifically, the fitted curves in the left panel do not include the time indicators 
present in equation ([27|l while the right-hand panel displays the fitted trajectories associated with 
the full model in equation ([27]) . 
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values of the weighted BIC. In particular, the best values of the weighted BIG for the 3 
to 6 class models were 161,751.3, 161,651.2, 161,651.4, and 161,681.0 respectively. The 
htted trajectories and estimated class-membership proportions from the four-class solution 
are displayed in Figure [TJ and here we labeled the four classes to roughly reflect increasing 
levels of conduct disorder severity with Class 1 (4) denoting the least (most) severe class. 
The right-hand panel of Figure [7] displays the estimated mean curves (/iii, ■ ■ ■, includ¬ 
ing all the predictors from fl27)) . The left-hand panel displays the mean curves obtained by 
excluding the time point indicators Ijtp- = 4 or tij = 5} and > 6}, and is intended 
to reflect population trajectories in terms of the BPI only. 

Table 3: Weighted cross-tabulation of random class assignment and maternal age at the birth of 
the child using only male subjects, and weighted cross-tabulation of class assignment and of ever 
being convicted of a crime during ages 14 — 18 using only male subjects. The class assignments 
were obtained by using the estimated posterior probabilities to randomly assign each subject to 
one of the four latent classes. The random assignment procedure was repeated 1,000 times, and 
the results were averaged. In the top table, we display in parentheses class proportions conditional 
on maternal age while in the bottom table we show conviction proportions conditional on class. 



Class 1 

Class 2 

Class 3 

Class 4 

maternal age < 20 

maternal age > 20 

34.2 (0.178) 

968.7 (0.284) 

44.8 (0.233) 

941.7 (0.276) 

40.2 (0.209) 

581.7 (0.171) 

73.1 (0.380) 

917.6 (0.269) 



Class 1 

Class 2 

Class 3 

Class 4 

ever convicted-yes 

ever convicted-no 

18.7 (0.031) 

588.0 (0.969) 

16.5 (0.025) 

646.8 (0.975) 

42.6 (0.105) 

363.1 (0.895) 

102.9 (0.144) 

609.4 (0.856) 


Additional variables in the CNLSY such as gender and criminal history allow us to ex¬ 
amine associations between these variables and the latent classes identihed in the four-class 
solution. Because this is an analysis that is relevant to the domain area of developmental 
psychopathology, these associations are critical for substantive validation of the four ex- 
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tracted classes. To investigate these associations, we randomly assigned each subject to one 
of the four identihed classes using the estimated posterior probabilities of class membership 
and cross-tabulated these assignments with other variables in the CNLSY. Table |3] displays 
a weighted contingency table of the random class assignment and maternal age at birth 
(< 20 years old, or > 20 years old) only for male subjects, and the resulting frequencies 
seem to support the validity of the four identihed classes as the proportion of subjects in 
classes 1 or 2 with maternal age > 20 is considerably higher than the proportion of subjects 
in classes 1 or 2 with maternal age < 20. Table [3] also shows a weighted cross tabulation 
of class assignment and of ever being convicted of a crime between ages 14 — 18, and as 
demonstrated by this contingency table, the prevalence of criminal outcomes in classes 3 
and 4 is substantially higher than in classes 1 and 2. Moreover, the frequency of criminal 
outcomes in those assigned to class 4 is considerably greater than that of the other three 
classes. 


6.2 Diagnostics 


Whereas there are a variety of aspects of the model which we could assess, our interest here 
lies in checking whether our specihcation of the within class distribution is reasonable. In 
particular, we are especially interested in examining if the assumed within class correlation 
structure seems to hold in the CNLSY data; that is, we want to check if Corr(|/jfc, yij\Zi = c) 
decays exponentially over time. If the class labels were known, we could check this by 
directly stratifying subjects into their respective classes and computing the desired corre¬ 
lations. We mimic this process by using each subject’s estimated posterior probabilities 
to randomly assign each subject to one of the latent classes. Such a diagnostic approach, 
where within-class quantities are checked by sa mpling from the estimated po sterior proba¬ 


bilities, is similar to the procedure discussed in 


Bandeen-Roche et ah 


(1997). As shown in 


that paper, this procedure is valid for detecting departures from the model if the assumed 
latent class model is indeed false. 

Estimated autocorrelation functions obtained from the random stratihcation procedure 
described above are displayed in Figure [8] with the same class labels as in Figure [71 For each 
random stratihcation, we used the subject-specihc sampling weights to compute weighted 


27 






Figure 8: Sample autocorrelation functions (weighted) obtained by using the estimated posterior 
probabilities of class membership to randomly assign each subject to one of the four latent classes. 
The random assignment procedure was repeated 1,000 times; the displayed autocorrelation values 
represent the average autocorrelation values from these replications. 



Figure 9: Sample over dispersion (weighted) obtained by using the estimated posterior prob¬ 

abilities of class membership to randomly assign each subject to one of the four latent classes. 
This random assignment procedure was repeated 1,000 times; the displayed over dispersion values 
represent the average over dispersion values from these replications. 





estimates of the autocorrelation and then averaged the results over 1, 000 replications of 
the random stratihcation process. The autocorrelation plots in Figure [7] show that the 
AR(1) structure seems somewhat plausible for the CNLSY data in that the within-class 
correlations decay substantially over time. However, for most classes, the correlations do 
not seem to decay quite as quickly as would occur under an AR(1) assumption. A similar 
diagnostic plot for the scale parameters cfyc is shown in Figure |9l This shows estimates of 
over dispersion for each time point obtained through repeatedly performing random class 
assignment. The plots in Figure [3 suggest that the original estimates of 0c are reasonable 
and that the level of over dispersion is relatively constant over time. 

7 Discussion 

We have presented a method for performing latent class analysis on longitudinal count data. 
The motivation behind this work has been to express many of the core features of latent 
class models or growth mixture models in a straightforward, computationally tractable 
framework that will improve computational performance and model identifiability, and 
that will perform well even if the true data generating mechanism is the popular growth 
mixture model. The autoregressive structure used in our model serves both as a natural 
way to model dependence over time and to achieve clearer separation of correlation due to 
the repeated-measurements structure and correlation due to the latent class structure. In 
terms of computation, one of the chief advantages of this approach is the availability of the 
subject-specific likelihood in closed form; this simplihes computation considerably at least 
when compared with procedures that employ random effects and which require the use of 
numerical integration procedures within each class. In addition, because computational 
efficiency has been a primary motivation, we described a quasi-EM approach which we 
found to be especially useful in this setting. 

We also have offered novel notions of class separation inherent in the data and of a given 
modeling procedure to recover that level of discrimination. Based on model classification 
indices such as those used in classihcation regression models, the oracle-based CSI quan- 
tihes the degree to which the information in the data can correctly classify subjects given 
the correct model and parameters. The EED, by contrast, quantifies the degree to which a 
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given procedure and sample size can correctly classify subjects; the EED can be compared 
to the CSI, the latter acting as an upper bound. We found excellent classihcation perfor¬ 
mance by applying our modeling procedure to Poisson-normal random effects latent class 
model data. Our model performed very nearly as well as it did when our model was the 
data generating mechanism with comparable CSI values, although the range of alternative 
models considered here is quite limited and investigating other forms of misspecihcation 
would certainly be worthwhile. 
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A Appendix 


A.l Transition Function 


The transition fnnction Xj, 0c) = PiiVijlVij-u ^c) can be expressed as the con- 

volntion of a beta-binomial distribntion and a negative binomial distribntion 


Pi{yij\yi,j- 1 ] Oc) = ^ fiiyij - k)gi{k). 

k=0 


(28) 


Above, fi{-) represents the probability mass fnnction for a beta-binomial distribntion with 
parameters (yij, acPlj_ihc, (1 - «c)Kj_i/7c 


m = 


r(2/ij-i + l)T{plj^)T{k 4- ac)vlj i)T{yij i + (1 - OcXj-i) 


(29) 


T{k + l)T{yij_i -k + l)T{acVlj_i)T{{l - «cXi-i)r«j_i + yi,j-i) 

and gi{-) represents the probability mass fnnction for a negative binomial distribntion with 
mean — etc) and variance — ac)(l + 7c) 

T{k + {1 - ac)ptj/lc} ( 7c ^ 7^ \fc 


r{fc-4l}r{fc-4 (1 - Q;c)/i7/7c} '^1+ 7c 
By applying (|28ll . the transition fnnction can then be written as 


1 + 7t 


(30) 




- Afj + yi,j-i - k)r{gf- - Xr. + y^j - k) 

k=0 


\-n,3 

- E 


V 7c / -k + l)T{yij -k + l)T{k -4 1) 

where = ac04pfJ7I/7c, vfj = pIj/Xc and 


A.2 Parameter Initialization 

The parameter initialization procednre is detailed below 

1. Choose K “cluster centers” for the regression coefficients (/3^,... This is done 

by randomly selecting K subject and htting a separate Poisson regression for each 
subject. 
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2. Assign each subject to one of the K classes through Si = argniin{Zi)(yi; j3^)}. Here, 

C 

D{yi] (3^) = —2 log L{(3^\yi) is the usual deviance associated with a Poisson regression. 
Namely, A>(yi; /3J = 2 “ Vij log(/^fj) + log(l/uO) • 

3. Using this hard assignment of subjects to clusters, compute a new value of (3^ by 
fitting a Poisson regression for the subjects in the set Sc = {i '■ Si = c}. Do this for 
each cluster c = 1,... ,K. 

4. Repeat steps (2)-(3) twice. 

5. Compute the mixture proportions through tTc = ^ = c}, and compute each 

6c = (/3c, ttc, 7c) by solving J2ieSc = 0, where Ui{-) is as described in equation 

dni). 

A.3 Estimation with Sampling Weights 

Suppose Vi, i = l,...,m are sampling weights such that Vi is proportional to the in¬ 
verse probability that subject i is included in the sample. We compute initial estimates 
(©^^^TT®) in the same way as the unweighted case. Given estimates we pro¬ 
duce updated estimates in the {k + 1)^* step through the following process. 

• Update 6c = {(3^, ac, 7c) by solving 

m 

Y,ViWic{@^^\Tz^’^^)U,{6c) = Q, (31) 

i=l 

where Uj(-) is the estimating function described in equation fll2p . 

• Update Tic through 

' “ EEU ' ' 

We determine convergence by stopping when the weighted estimating function DGi(®, tt) 
is sufficiently close to zero. 
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Table 4: Values of the regression coefficients in Scenarios I and II. 


Scenario I 

Scenario II 


C=1 

c=2 

c=3 

c=4 

C=1 

c=2 

c=3 

c=4 

^0 

-0.4 

1.5 

0.0 

1.4 

-0.4 

1.4 

0.0 

1.2 

PI 

-0.1 

-0.7 

0.65 

0.0 

-0.1 

-1.0 

0.9 

0.0 


A.4 Simulation Parameter Values 

In Scenarios I and II of the INAR simulations, we dehne the mean curves through log(/r^j) = 
The values of (31) used for Scenarios I and II are given in Table 01 
The Poisson-Normal simulations use four simulation settings. One of these settings 
(the setting with the largest class separation index) has eight time points with time points 
tij = j/9 for j = 1,..., 8. The other three settings have five time points with tij = j/6 for 
j = 1,..., 5. Each setting utilizes the parameters (/9 q, (31, tTc) for c = 1,..., 4. The 

values of these parameters for each simulation setting (as indexed by the value of the CSI) 
are shown in table [5] 
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Table 5: Parameter values used for the Poisson-Normal simulations. The Class Separa¬ 
tion Index (CSI) shown for each simulation setting was computed with the polytomous 
discrimination index. 



CSI=0.957 

CSI=0.842 

CSI=0.765 

CSI = 0.633 

/So' 

-0.90 

-0.90 

-0.90 

-9.00 

/Si' 

-0.35 

-0.35 

-0.35 

-0.35 


0.30 

0.40 

0.85 

1.50 

(^h 

0.125 

0.20 

0.50 

0.70 

TTl 

0.50 

0.50 

0.50 

0.50 

/So^ 

1.55 

1.55 

1.55 

1.3 

/S? 

-2.10 

-2.10 

-2.10 

-2.10 

(^20 

0.08 

0.15 

0.35 

1.00 


0.05 

0.075 

0.25 

0.50 

T^2 

0.25 

0.25 

0.25 

0.25 


-0.40 

-0.65 

-0.65 

-0.7 

/S? 

1.90 

2.00 

2.00 

1.75 

^30 

0.10 

0.20 

0.60 

1.25 


0.06 

0.075 

0.25 

0.55 

TTS 

0.15 

0.15 

0.15 

0.15 

/So" 

1.40 

1.25 

1.25 

1.00 

/Si" 

-0.05 

-0.05 

-0.05 

-0.05 

f^lo 

0.06 

0.10 

0.35 

1.00 

all 

0.04 

0.04 
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0.30 

0.45 


714 


0.10 


0.10 


0.10 


0.10 











References 


Baker, P. C. and Mott, F. L. (1989). NLSY child handbook 1989: A guide and resource 
doeument for the National Longitudinal Study of Youth 1986 Child Data. Center for 
Human Resource Research. 

Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., and Rathouz, P. J. (1997). Latent 
variable regression for multiple discrete outcomes. Journal of the American Statistical 
Association 92, 1375-1386. 

Bockenholt, U. (1999). Analyzing multiple emotions over time by autoregressive negative 
multinomial regression models. Journal of the American Statistical Association 94, 757- 
765. 

Breslow, N. and Clayton, D. (1993). Approximate inference in generalized linear mixed 
models. Journal of the American Statistical Association 88, 9-25. 

Chase-Landsdale, P. L., Mott, F. L., Brooks-Gunn, J., and Phillips, D. A. (1991). Children 
of the national longitudinal study of youth: A unique research opportunity. Develop¬ 
mental Psyehology 27, 918-981. 

Crowder, M. (1986). On consistency and inconsistency of estimating equations. Economet- 
rie Theory 2, 305-330. 

Elashoff, M. and Ryan, L. (2004). An EM algorithm for estimating equations. Journal of 
Computational and Graphieal Statisties 13, 48-65. 

Hall, D. B. and Severini, T. A. (1998). Extended generalized estimating equations for 
clustered data. Journal of the American Statistical Association 93, 1365-1375. 

Hand, D. J. and Till, R. J. (2001). A simple generalisation of the area under the ROC 
curve for multiple class classification problems. Machine Learning 45, 171-186. 

Heagerty, P. J. (1999). Marginally specihed logistic-normal models for longitudinal binary 
data. Biometrics 55, 688-698. 


35 



Heyde, C. C. (1997). Quasi-Likelihood and Its Application: A General Approach to Optimal 
Parameter Estimation. Springer-Verlag, New York. 

McKenzie, E. (1986). Autoregressive moving-average processes with negative-binomial and 
geometric marginal distributions. Advances in Applied Probability 18, 679-705. 

Mofiitt, T. E. (1993). Adolescence-limited and life-course-persistent antisocial behavior: A 
development taxonomy. Psychological Review 100, 674-701. 

Mossman, D. (1999). Three-way ROCs. Medical Decision Making 19, 78-89. 

Muthen, B. (2004). Latent variable analysis: Growth mixture modeling and related tech¬ 
niques for longitudinal data. In Kaplan, D., editor. Handbook of guantitative methodology 
for the social sciences, pages 345-368. Sage Publications. 

Muthen, B. and Shedden, K. (1999). Finite mixture modeling with mixture outcomes using 
the EM algorithm. Biometrics 55, 463-469. 

Odgers, C. L., Caspi, A., Broadbent, J. M., Dickson, N., Hancox, R. J., Harrington, H., 
Poulton, R., Sears, M. R., Thomson, W. M., and Moffitt, T. E. (2007). Prediction 
of differential adult health burden by conduct problem subtypes in males. Archives of 
General Psychiatry 64, 476-484. 

Qu, Y., Tan, M., and Kutner, M. H. (1996). Random effects models in latent class analysis 
for evaluating accuracy of diagnostic tests. Biometrics 52, 797-810. 

Schall, R. (1991). Estimation in generalized linear models with random effects. Biometrika 
78, 719-727. 

Van Calster, B., Van Belle, V., Vergouwe, Y., Timmerman, D., Van Huffel, S., and Steyer- 
berg, E. W. (2012). Extending the c-statistic to nominal polytomous outcomes: the 
polytomous discrimination index. Statistics in Medicine 31, 2610-2626. 

Zeger, S. L. and Karim, M. R. (1991). Generalized linear models with random effects: A 
Gibbs sampling approach. Journal of the American Statistical Association 86, pp. 79-86. 


36 



Zeger, S. L. and Liang, K.-Y. (1986). Longitudinal data analysis for discrete and continuous 
outcomes. Biometrics 42, 121-130. 


37 



